Introduction

In this document, we’ll go through the many many different steps I’ve taken in an attempt to analyze several years of waterbird surveys at Estero La Cruz in Bahía de Kino, Sonora. Rather than only include the final model and results, I started this document when I began working with the data– so this document has evolved and grown alongside me as I worked through the various problems/errors/roadbloacks I encountered. While initally working with the data, I tried to use multiple variables to take advantage of the various data we collect at each survey. However, I’ve learned a lot about modeling and ultimately found myself following the advice to keep it all as simple as possible. By the end of this document, you’ll see that a lot of time and effort has been put into this to address a simple question- Has the number of shorebirds at Estero La Cruz changed over time?

Los Datos

The first steps are loading all required packages and of course—the data. This may seem like a long list of packages, but I’ve only added in the important ones that I know are being applied. Any additional packages that those on this list are dependent upon will also be loaded.

All the necessary files should be saved in the working directory. The files we’ll Counts of birds in each guild for each survey at Estero La Cruz from 2013/14 to present cycle and these are “environment variables” taken at each survey. Similar to the biodiversity

Here’s a sneak peak of the data–as you can see it’s set up with each survey effort as a row with every guild as the first columns. After the guilds, we then have the environmental variables.

Preview of Guild Data
Date shore gulls_terns pel_corm land waders waterfowl Cloud Temp Precip
2013-09-12 867 315 286 12 34 3 5 80 0
2013-09-27 2129 1996 363 4 50 1 2 80 0
2013-10-11 793 587 506 6 18 0 0 72 0
2013-10-25 1243 1508 459 1 57 0 0 75 0
2013-11-07 2704 1473 4527 6 97 98 0 77 0

For the sake of this document, we’ll fowllow all steps with the shorebirds and then present the results. Then, I’ll show some of the same necessary steps but mostly just present the results from the other guilds.

NBGLMERs

To model the data, I decided to use mixed models aka GLMMs or GLMERs.

EXLPAIN AND DESCRIBE MODELS BETTEER TO EXPLAIN WHAT THE EFFECTS ARE

random can be intercept or rate of change-

Here we’ll use shorebirds as the example. I started playing with the idea of mixed models because they are a common tool used in biological modeling. Given the data we’re wworking with and question we’re trying to answer, the random effect that needs to be taken into account with this modeling is the observer(s). With several years of data and different people counting birds over the years, that’s probably one of the biggest sources of variation in the data—and hence random effect of the observer(s)!!

Why is observer the random effect?

This is the observer because we want to calculate and take into account the variance of the observer, but we do not necessarily want to quantify/estimate the impact of each individual observer/group of observers–and for this, it is the random effect.

Why Negative Binomial?

So knowing that we wanted to use mixed models with count data, we have a few different types of GLMERs that we can run. Below I have selected negative binomial models because it is the best fit that allows the overdispersion in the data. For example, a Poisson model cannot be used because it assumes the variance is equal to the mean. Well in our dataset, the variance is much much greater than the mean (hence overdispersion) so a Poisson model would not be appropriate.

Shorebirds

Like I mentioned above, we’ll use shorebirds as the example if the work that I’ve done. And after, I’ll share the results from my modeling of the other guilds.

Preliminary Plotting

We’ll start off by making a couple graphs to get an idea of the data we’re working with. Some things to note and keep in mind about the data are two very important things. 1. “temporada” is essentially year. But it is not named “year” because it’s the yearly work cycle/season. That translate to the year “14” is actually fall 2013-summer 2014, etc. 2. “season” is a categorical variable that has two levels: 1=fall/spring and 2=winter.

It’s always good to take a look at the date. For example here you can see that there’s quite a bit of variability in the counts in the years.

So here is my model:

## $residDev
## [1] 91.94035
## 
## $residDF
## [1] 110
## 
## $ratio
## [1] 0.8358214
## 
## $P.ChiSq
## [1] 0.893584
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.062)  ( log )
## Formula: shore ~ temporada + (1 | obs)
##    Data: guilds
## 
##      AIC      BIC   logLik deviance df.resid 
##   1955.3   1966.2   -973.6   1947.3      109 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0262 -0.6600 -0.2132  0.3616  3.3033 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.3017   0.5493  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 10.50299    1.26539   8.300   <2e-16 ***
## temporada   -0.18762    0.07606  -2.467   0.0136 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr)
## temporada -0.991

Above I included a test for overdispersion since that was a big concern throughout my initial efforts in trying to model the data. As you can see, the ratio of the deviation to DF (of the residuals) is fairly close to 1. The P-value of the Chi-squared comparing the residual deviance and degrees of freedom to a chi-square distribution is pretty close to 1, which is good. This means we are not dealing with overdispersion.

Above you can also look over the summary of the model. Just from an intial look at this model summary, it looks like there is a significant decline in the abundance of shorebirds over time. It also appears, as expected, that there is a significant increase in the abundance of birds in the winter months in comparison to the rest of the year. Before we saying anything more, let’s do a handful of other steps to check the model and make sure this will be our final model. Stay tuned below first for some model selection work and later some further interpretation.

Model Selection

Here I’ll show some of the steps I took when trying to select a model. The model above is a pretty good fit for what we’re doing, but there are some other thigns I could include in the model. To make sure that the first model above is indeed the best for our data, we’lll go through a series of steps to gain some confidence. It’s not quite as pretty as using something like the dredge() function from the MuMln package (I couldn’t get it to work with a glmer.nb() model), but as far as I understand model selection, we can do some of the same things but one step at a time.

First we’ll make a null model and compare it to our first simple model that we made that includes

## Data: guilds
## Models:
## null.shore: shore ~ 1 + (1 | obs)
## glmernb.shore: shore ~ temporada + (1 | obs)
##               npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## null.shore       3 1958.4 1966.6 -976.19   1952.4                       
## glmernb.shore    4 1955.3 1966.2 -973.64   1947.3 5.1006  1    0.02392 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since there is a significant difference, this means we can reject the null model and accept the simple negative binomial mixed model as a betterr fit to the data. I also feel OK about that given the AICc/BIC are slightly lower, and the log-Liklihood has moved a bit closer to 0 (which is another one of many things to look for).

Selecting a Model

Now let’s add in some other potential variables that would affect the presence of birds, in this case the season and the tide! Here tide is split into two variables- tide height (low/mid/high) and tide direction (rising/slack/falling).

We can make a neat model selection table based on information selection criteria. Here we will use the second order AIC (AICc) for comparison. For the model names, I abbreviated the names into “modX” so note that “mod” = “glmernb.shore”, “mod1” = “glmernb.shore1”, “mod2” = “glmernb.shore2”, etc…

## 
## Model selection based on AICc:
## 
##       K    AICc Delta_AICc AICcWt Cum.Wt      LL
## mod1  5 1928.79       0.00   0.40   0.40 -959.12
## mod3  7 1929.16       0.37   0.33   0.74 -957.05
## mod2  8 1930.45       1.65   0.18   0.91 -956.53
## mod4 10 1931.81       3.02   0.09   1.00 -954.83
## mod   4 1955.65      26.86   0.00   1.00 -973.64
## null  3 1958.60      29.81   0.00   1.00 -976.19

From this, our glmernb.shore1 model seemingly performs the best.

Let’s do another check by comparing increasingly complex models using ANOVAs. Again, if the Chi-squared value is significant, that means that the model is a better fit. We will set up comparisons between our simplest model (glmernb.shore) and each of the increasingly complex models. While we can use the code to set up single comparisons between models (ie. anova(mod1, mod2)) we can also combine it into one line of code such as anova(mod, mod1, mod2, ...) and we will get a table that compares our simplest model (mod1) with the rest of the models.

npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
glmernb.shore 4 1955.278 1966.188 -973.6392 1947.278 NA NA NA
glmernb.shore1 5 1928.231 1941.868 -959.1156 1918.231 29.047155 1 0.0000001
glmernb.shore3 7 1928.096 1947.187 -957.0479 1914.096 4.135420 2 0.1264751
glmernb.shore2 8 1929.062 1950.881 -956.5311 1913.062 1.033537 1 0.3093293
glmernb.shore4 10 1929.656 1956.930 -954.8280 1909.656 3.406285 2 0.1821103

Now let’s try it again, but starting with our best model so far, glmernb.shore1.

npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
glmernb.shore1 5 1928.231 1941.868 -959.1156 1918.231 NA NA NA
glmernb.shore3 7 1928.096 1947.187 -957.0479 1914.096 4.135420 2 0.1264751
glmernb.shore2 8 1929.062 1950.881 -956.5311 1913.062 1.033537 1 0.3093293
glmernb.shore4 10 1929.656 1956.930 -954.8280 1909.656 3.406285 2 0.1821103

It looks like our best model was a good choice and we feel good about the glmernb.shore1 model and use it with confidence. That’s because what each of these tables show, is that with adding in the height and/or direction of the tide, we do not get a better fitting model. That means we should stick to our simple model that has the effect of work cycle/year and season, as it is the best fit.

Another potential approach to model selection would be to average the models. But rather than follow more coomplicated steps to do that, I think it’s OK to stick with this process of selecting the best model

Calculating R^2

Now that we have selected a model, one interesting piece of information about the model is checking the R^2 value. While this measure is commonly used in lms and glms it can be applied to mixed models. Here the function will calculate the R^2 values taking into account only the fixed effects (marginal) as well as all effects (conditionall) using 3 different methods. Here’s the explanataion taken from the help("r.squaredGLMM"):

Marginal R_GLMM² represents the variance explained by the fixed effects, and is defined as: \[R_GLMM(m)² = (σ_f²) / (σ_f² + σ_α² + σ_ε²)\] Conditional R_GLMM² is interpreted as a variance explained by the entire model, including both fixed and random effects, and is calculated according to the equation: \[R_GLMM(c)² = (σ_f² + σ_α²) / (σ_f² + σ_α² + σ_ε²)\] where σ_f² is the variance of the fixed effect components, σ_α² is the variance of the random effects, and σ_ε² is the “observation-level” variance.

So with that we can make a table of these values:

##                 R2m       R2c
## delta     0.2803543 0.4955452
## lognormal 0.3210871 0.5675431
## trigamma  0.2281494 0.4032695

From the this, it looks like our model does a pretty decent job of fitting the data and explaining the variance. Given we’re modeling change over time and not taking into account other environmental variables that are likely affecting the presence/absence/abundance of birds, I feel okay about believing this model.

Check Residuals

We’ll run more or less the same code as we used above. See the included .r file for the raw code.

Again, it looks more or less the same as before. Our residuals aren’t perfectly randomly distributed but I’m not sure what I could do at this point to get a better fit. Given the process to get to this model, I think it’s acceptable for the sake of this analysis.

Plotting Model Predictions

Now it’s time to look at our model predictions. One of the most important aspects of a regression are the estimates of each variable in the model. Since we’re working with negative binomial models, there is a log link meaning the effect estimates need to be log transformed. We can do that easily by using R as a fancy calculator. Before we do that, let’s take a quick look at the different estimates of the model.

The first thing you might notice is that the intercept is very high- yes but it’s not as important. If we think about the data we’re modeling, there are counts of shorebirds in the thousands, and in the first year of data (2013-2014 season), there is a very high degree of variability.
What we really want to pay more attention to are the estimates of two fixed effects of the model. Just by looking at this grarph, the effect of season is a positive increase while that of temporada(which is the year/cycle) is negative.
If you recall from the summary output of the model the estimate of the variable temporada was: -0.2139745. To translate that into normal terms, as the year/cycle changes by 1, the change in shorebird abundance is multipled by a factor of 0.8073726. This can be interpreted into that “the abundance of shorebirds is declining by 19.2627394% each year.” One way we can visualize this is by plotting the predicted values on a graph. We can plot the overall trend of the model. Note that this will back-transform the predicted values to the original scale of our response variable.

However, we know there are differences between the season and that there was a significant effect in the model. From the model estimate of that variable, we can say winter season sees an increase of 298.1387504% in the number of shorebirds in comparison to the rest of the year. Knowing that the difference is so great, let’s take a look at the same graph as above, but grouped by season. Recall that the seasons are split into 1: Fall/Sping and 2: Winter.

Note that both graphs above are the same data, just plotted differently.

In both graphs, the seasons are split into 1 (Fall/Sping–the migration seasons) in purple and 2 (Winter) in blue. These colors will be consistent throughout the rest of this document. This shows that while shorebird abundance is declining in both seasons, it’s clear that the decline is more pronounced in the winter-we can visually see this as the slope of the line is steeper. This could potentially be due a decline in the migratory species overwintering, so it would be a good idea to look further into the migratory species.

Plot the random effects. And also a diagnostic plot: > For generalized linear mixed models, returns the QQ-plot for random effects.

So diagnostic plot is essentially just a qq plot

## $obs
## `geom_smooth()` using formula 'y ~ x'

The first plot shows the random effects. The red line indicates no effect. The blue indicates a positive effect while purple is negtative. The second plot above, as mentioned, is a qqplot.

## Warning: Ignoring unknown parameters: width, conf.int

Final Results of Other Guilds

Below, I present the results of the other guilds.

Waterfowl

Both during my model exploration with shorebirds as well as with other guilds, I ran into a handful of problems. One common problem I ran into was a failure for the model to converge. This convergence failure essentially means that the model fits the data extremely poorly, or the observations just don’t match up. But this isn’t the end of the world. For example, when I tried to apply a negative binomial mixed model to the waterfowl guild data, I ran into this problem. Below is an example of how I handled it.

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00631856 (tol = 0.002, component 1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(0.5939)  ( log )
## Formula: waterfowl ~ temporada + season + (1 | obs)
##    Data: guilds
## 
##      AIC      BIC   logLik deviance df.resid 
##   1158.7   1172.4   -574.4   1148.7      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.7671 -0.6636 -0.3612  0.3186  3.0002 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2323   0.482   
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.54422    1.55325   6.145 8.01e-10 ***
## temporada   -0.36895    0.09658  -3.820 0.000133 ***
## season2      1.29676    0.30908   4.195 2.72e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.987       
## season2    0.219 -0.318

Luckily, I found this neat guide that I followed and found helpful. Below I follow the steps of the aformentioned guide and fix the problem. What the steps below basically do is try to optimize the model, starting from the inital fit but increasing the number of iterations.

## [1] 0.4820253
## [1] 8.6121e-06
## [1] 8.6121e-06

No more failure to converge! So let’s check for overdispersion and look at the summary output of the model.

## $residDev
## [1] 81.82569
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 0.7506944
## 
## $P.ChiSq
## [1] 0.9758231
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(0.5939)  ( log )
## Formula: waterfowl ~ temporada + season + (1 | obs)
##    Data: guilds
## Control: glmerControl(optCtrl = list(maxfun = 20000))
## 
##      AIC      BIC   logLik deviance df.resid 
##   1158.7   1172.4   -574.4   1148.7      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.7671 -0.6636 -0.3613  0.3186  3.0001 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2323   0.482   
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  9.54401    0.55262  17.270  < 2e-16 ***
## temporada   -0.36894    0.03602 -10.242  < 2e-16 ***
## season2      1.29669    0.30533   4.247 2.17e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.888       
## season2   -0.019 -0.281

It looks like there is a significcant decline in the abundance of waterfowl over time. It also appears, as expected, that there is a significant increase in in the winter months in comparison to the rest of the year.

So now we can follow the steps we took with the shorebirds which means we’ll start by plotting the data.

Recall our model glmernb.fowl2 from directly above.
Similar to before, we can once again calculate the R^2 values. This time we’ll show it in neat table.

R2m R2c
delta 0.2931394 0.3782012
lognormal 0.3952562 0.5099498
trigamma 0.1673828 0.2159532

Once again, it’s looking like we’ve got a pretty decent looking model.

Let’s check the residuals

And now let’s look at the model predctions. Using the estimate of our model, we can say there has been a decline of 30.8533103% over the seasons and an increase of 365.7134803% in the winter.

Similar to the shorebirds, we can see a similar difference in that the decline seems much more stark in the winter.

Gulls/Terns/Skimmers

Check the data

Make the model

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.1854)  ( log )
## Formula: gulls_terns ~ temporada + season + (1 | obs)
##    Data: guilds
## 
##      AIC      BIC   logLik deviance df.resid 
##   1805.0   1818.6   -897.5   1795.0      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0678 -0.7161 -0.2335  0.6357  4.2453 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.08125  0.285   
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  7.34682    0.98663   7.446 9.59e-14 ***
## temporada   -0.04662    0.05946  -0.784  0.43309    
## season2      0.57935    0.19179   3.021  0.00252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.988       
## season2    0.028 -0.105
## $residDev
## [1] 94.99911
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 0.8715514
## 
## $P.ChiSq
## [1] 0.8280607

Once again, it looks likee there is a significant decline in the abundance of gulls/terems/skimmers over time as well as a significant increease in the winter due to the influx of migrants. From the model estimates, there is a decline of 4.554998% of the in gulls/terns/skimmers guild. While this isn’t as exaggerated as the other guilds, it is still something worth noting.

Calculate R^2 values

R2m R2c
delta 0.0851377 0.1654172
lognormal 0.1105094 0.2147128
trigamma 0.0590585 0.1147470

Review the residuals

Plot the model predictions

Here we can visualize that the decline is not as steep as the other guilds, but still present. We can also once again see the notable differences between the winter and non-winter seasons.

Pelicans/Cormorants/Allies

Plot the data

Making the model:

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.4891)  ( log )
## Formula: pel_corm ~ temporada + season + (1 | obs)
##    Data: guilds
## Control: glmerControl(optCtrl = list(maxfun = 20000))
## 
##      AIC      BIC   logLik deviance df.resid 
##   1433.4   1447.0   -711.7   1423.4      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.1437 -0.6468 -0.3666  0.2948  4.0194 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.4335   0.6584  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.42046    0.20036  42.026  < 2e-16 ***
## temporada   -0.22764    0.01202 -18.942  < 2e-16 ***
## season2      0.87525    0.20483   4.273 1.93e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.510       
## season2   -0.282 -0.280
## $residDev
## [1] 114.2199
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 1.047889
## 
## $P.ChiSq
## [1] 0.3471449

Pelicans/cormorants/allies guild is declining by 20.3589084% each year.

Calculate R^2 values

R2m R2c
delta 0.2305754 0.5309768
lognormal 0.2593109 0.5971499
trigamma 0.1934467 0.4454756

Plot model predictions Something to note here is the axis. If you look at the boxplots above, you’ll see there are two very high counts (ploted as dots aka outliers) in the 2013-2014 season. Now that can very obviously be skewing the data, but take a second to look at the axis of our graphed model predictions and you’ll see that it’s much smaller.

Long-legged Waders

The only guild that lacks a final model

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00645303 (tol = 0.002, component 1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(2.6765)  ( log )
## Formula: waders ~ temporada + season + (1 | obs)
##    Data: guilds
## 
##      AIC      BIC   logLik deviance df.resid 
##   1175.5   1189.1   -582.8   1165.5      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3878 -0.6319 -0.1972  0.4254  3.2289 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2503   0.5003  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.02105    1.05489   5.708 1.14e-08 ***
## temporada   -0.11036    0.06322  -1.746   0.0809 .  
## season2     -0.06425    0.14351  -0.448   0.6543    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.993       
## season2   -0.038 -0.018
## $residDev
## [1] 95.30029
## 
## $residDF
## [1] 109
## 
## $ratio
## [1] 0.8743146
## 
## $P.ChiSq
## [1] 0.822308

As you can see, this model failed to onverge and I haven’t worked on getting a better fit yet

Final Table of All Models

We can put a summary of all the important parts of each model into a single table. Note that here the “estimates” of each variable have been back transformed. For example, in the shorebird model, rather than display the estimate of “-0.21397” for temporada (aka the year), the table will show the back transformed intercept (recall that for a negative binomial model we have to use the exp() function) so the number displayed is the incidence rate ratio of 0.8073726. This simply means that as that variable increases by 1 unit, the response variable is multipled by a factor of 0.8073726. Also note that the R^2 values displayed are calculated using the log-normal method that was shown in the tables above.

  Shorebirds Waterfowl Gulls/Terns/Skimmers Pelicans/Cormorants/Allies
Predictors Incidence Rate Ratios Conf. Int (95%) P-Value Incidence Rate Ratios Conf. Int (95%) P-Value Incidence Rate Ratios Conf. Int (95%) P-Value Incidence Rate Ratios Conf. Int (95%) P-Value
Intercept 31436.98 2794.32 – 353676.27 <0.001 13960.80 4726.26 – 41238.53 <0.001 1551.26 224.32 – 10727.61 <0.001 4538.97 3064.83 – 6722.15 <0.001
Year 0.81 0.70 – 0.93 0.004 0.69 0.64 – 0.74 <0.001 0.95 0.85 – 1.07 0.433 0.80 0.78 – 0.82 <0.001
Season 2.98 2.06 – 4.32 <0.001 3.66 2.01 – 6.65 <0.001 1.78 1.23 – 2.60 0.003 2.40 1.61 – 3.58 <0.001
Random Effects
σ2 0.55 0.99 0.61 0.52
τ00 0.32 obs 0.23 obs 0.08 obs 0.43 obs
ICC 0.36 0.19 0.12 0.46
N 42 obs 42 obs 42 obs 42 obs
Observations 113 113 113 113
Marginal R2 / Conditional R2 0.321 / 0.568 0.395 / 0.510 0.111 / 0.215 0.259 / 0.597

Migrants and Residents

Another important grouping to make in the bird is between the migrantory and resident species for many reasons. One interesting reason just based off the guild muilds is that when looking at the plotted model predictions, it seemed that in general, the decrease in birds over time was more pronounced in the winter season.

## Warning: 112 parsing failures.
## row    col expected actual            file
##   1 Precip a double     -- 'mig_final.csv'
##   2 Precip a double     -- 'mig_final.csv'
##   3 Precip a double     -- 'mig_final.csv'
##   4 Precip a double     -- 'mig_final.csv'
##   5 Precip a double     -- 'mig_final.csv'
## ... ...... ........ ...... ...............
## See problems(...) for more details.
##  [1] "Date"        "migrants"    "residents"   "Cloud"       "Temp"       
##  [6] "Precip"      "Wind"        "temporada"   "szn"         "obs"        
## [11] "tide_height" "tide_dir"    "mes"         "mes.yr"      "season"

Migrants

Start with migrants. Plot data:

Make the model

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.9745)  ( log )
## Formula: migrants ~ temporada + season + (1 | obs)
##    Data: mig
## 
##      AIC      BIC   logLik deviance df.resid 
##   1807.2   1820.8   -898.6   1797.2      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3566 -0.5788 -0.1004  0.4177  2.9619 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.2397   0.4896  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  8.68772    1.08189   8.030 9.74e-16 ***
## temporada   -0.13459    0.06472  -2.080   0.0376 *  
## season2      0.78086    0.15731   4.964 6.91e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.992       
## season2   -0.050 -0.012

12.5925791% decrease over the years 218.3349139% higher in the winter

Plot the residuals

Calculate R^2 values

R2m R2c
delta 0.2065679 0.4611194
lognormal 0.2302981 0.5140921
trigamma 0.1782552 0.3979173

Plot model predictions

First we’ll look at the effects of year on the model, with the other effects considereed constants.

And now compare that graph to the model, but dividing out the season as we

Residents

Repeat workflow Plot data:

Make the model

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(2.2867)  ( log )
## Formula: residents ~ temporada + season + (1 | obs)
##    Data: mig
## 
##      AIC      BIC   logLik deviance df.resid 
##   1669.8   1683.4   -829.9   1659.8      108 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3347 -0.6801 -0.2256  0.4741  3.0598 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.3366   0.5802  
## Number of obs: 113, groups:  obs, 42
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.76939    1.19320   5.673 1.40e-08 ***
## temporada   -0.05652    0.07160  -0.789     0.43    
## season2      0.82429    0.15937   5.172 2.31e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) temprd
## temporada -0.993       
## season2   -0.020 -0.033

Residents, although decreasing it’s not significant. While they aren’t necessarily increasing either, this means that the birds are probably fluctuating year to year but overall are not changing. This is good!! Also, similar to all other groups, the numbers of residents increases 228.0261205% in the winter months.

Plot the residuals

Calculate R^2 values

R2m R2c
delta 0.1814257 0.5366360
lognormal 0.1970360 0.5828095
trigamma 0.1625532 0.4808134

Plot model predictions As we can see in the model prediction graphs above- our confidence intervals are very large. And though the trend is a decrease over time, there is not a lot to support that statement. So in looking at these graphs, we can see how much overlap there is in the points and confidence intervals.

Final Table of Mig Models

Repeating what we did, we can create a model summary table for our two. And once again, what were the intercepts of the model have been back transformed into incidence rate ratios.

  Migrants Residents
Predictors Incidence Rate Ratios Conf. Int (95%) P-Value Incidence Rate Ratios Conf. Int (95%) P-Value
Intercept 5929.65 711.42 – 49423.56 <0.001 870.78 84.00 – 9027.43 <0.001
Year 0.87 0.77 – 0.99 0.038 0.95 0.82 – 1.09 0.430
Season 2.18 1.60 – 2.97 <0.001 2.28 1.67 – 3.12 <0.001
Random Effects
σ2 0.41 0.36
τ00 0.24 obs 0.34 obs
ICC 0.37 0.48
N 42 obs 42 obs
Observations 113 113
Marginal R2 / Conditional R2 0.230 / 0.514 0.197 / 0.583

This is a clean way to summarize some stuff.

Some Thoughts

Given that most of the guilds seem to be trending downward. Let’s take a look once again at the predicted values of the models for each guild. In each of the models, the starting point was much higher in the winter and seemingly decreased more notably.

One reason I had moved from working with biodiversity indices to working on this modeling project was because it was clear that important aspect of the survey data was missing. For the calculating the biodiversity indices for example, birds had to be identified to species. This completely glosses over the impact of larger aggregations of birds that are identified to a group but not necessarily species (ie estimations of “1000 peeps”). In separting the birds into guilds, this important aspect of data that can be captured.

So in looking at our model, it doesn’t seem as if residents are changing but migrants are. This might suggest that the overall declines across the bird guilds are unequally attributed to declines in migrants overwintering. This could be a big jump to make, but it’s something that seemingly makes sense. BUT ALSO with the separation between migrant/resident spp, I did not include these unidentified birds. Despite this, we still saw some interesting changes and that migrants are following these trends to be declinging.

But finally, a big thing to keep in mind is that there is a lot of variability in all our data over time so all results should be taken with a grain of salt.

Future Directions

While this was fun, the work isn’t done. For example, there are several other cool questions that can be answered. Part of the problem is the surveys lack a consistent protocol. For example, I do now know how people in 2015 measured the tide or temperature. Gathering better data on historic weather/climate conditions would be interesting to look at changes or potential correlations between temp and bird relative abundances. Though we take data on the temperature during each monitoring effort, examining climate from a wider lens would be interesting. Precipitation patterns or extreme temperature events could potentially be affecting the phenology of migration in some birds that may be using temp/precip patterns as environmental cues. While some temperature and precipitation data are available here, I found the data in the area to be incomplete and lacking the last few years. Another interesting direction would be to relate the abundances of shorebirds (and/or other guilds) to tides. While we take data on the tides, it would be possible to more accurately characterize the tide.